Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference

نویسندگان

  • Suyash Shringarpure
  • Eric P. Xing
چکیده

Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical constraints dictate that of a geographical/ethnic population, only a small number of individuals are genotyped. The resulting data are a sample from the entire population. If the distribution of sample sizes is not representative of the populations being sampled, the accuracy of population stratification analyses of the data could be affected. We attempt to understand the effect of biased sampling on the accuracy of population structure analysis and individual ancestry recovery. We examined two commonly used methods for analyses of such datasets, ADMIXTURE and EIGENSOFT, and found that the accuracy of recovery of population structure is affected to a large extent by the sample used for analysis and how representative it is of the underlying populations. Using simulated data and real genotype data from cattle, we show that sample selection bias can affect the results of population structure analyses. We develop a mathematical framework for sample selection bias in models for population structure and also proposed a correction for sample selection bias using auxiliary information about the sample. We demonstrate that such a correction is effective in practice using simulated and real data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supporting Information Supporting methods Estimating ancestry components in Admixture and Structure: Estimated individual ancestry from supervised and unsupervised clustering in Admixture are highly concordant for the autosomes for both migration events, and for the X chromosome for the Neolithic migration

Estimating ancestry components in Admixture and Structure: Estimated individual ancestry from supervised and unsupervised clustering in Admixture are highly concordant for the autosomes for both migration events, and for the X chromosome for the Neolithic migration scenario. For X-chromosomal ancestry estimated for the steppe migration, however, reference individuals do not emerge as clusters i...

متن کامل

Cut-off Sampling Design: Take all, Take Some, and Take None

Extended Abstract. Sampling is the process of selecting units (e.g., people, organizations) from a population of interest so that by studying the sample we may fairly generalize our results back to the population from which they were chosen. To draw a sample from the underlying population, a variety of sampling methods can be employed, individually or in combination. Cut-off sampling is a pr...

متن کامل

Effects of Marker Density, Number of Quantitative Trait Loci and Heritability of Trait on Genomic Selection Accuracy

The success of genomic selection mainly depends on the extent of linkage disequilibrium (LD) between markers and quantitative trait loci (QTL), number of QTL and heritability (h2) of the traits. The extent of LD depends on the genetic structure of the population and marker density. This study was conducted to determine the effects of marker density, level of heritability, number of QTL, and to ...

متن کامل

Effect of Markers Effect Estimation Methods, Population Structure and Trait Architercture on the Accuracy of Genomic Breeding Values

This study aimed to investigate the  effect  of  the method of estimating the effects of markers , QTLs distribution, number of QTLs, effective population size and trait heritability on the accuracy of genomic predictions. Two effective population sizes, 100 and 500 individuals, were simulated by QMSim software. A 100 cM genome including one chromosome was simulated where 500 SNPs and two diffe...

متن کامل

The Relative Improvement of Bias Reduction in Density Estimator Using Geometric Extrapolated Kernel

One of a nonparametric procedures used to estimate densities is kernel method. In this paper, in order to reduce bias of  kernel density estimation, methods such as usual kernel(UK), geometric extrapolation usual kernel(GEUK), a bias reduction kernel(BRK) and a geometric extrapolation bias reduction kernel(GEBRK) are introduced. Theoretical properties, including the selection of smoothness para...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2014